14 research outputs found

    LBNL-57677 Grid Collector: Facilitating Efficient Selective Access from Data Grids

    No full text
    Abstract — The Grid Collector is a system that facilitates the effective analysis and spontaneous exploration of scientific data. It combines an efficient indexing technology with a Grid file management technology to speed up common analysis jobs on high-energy physics data and to enable some previously impractical analysis jobs. To analyze a set of high-energy collision events, one typically specifies the files containing the events of interest, reads all the events in the files, and filters out unwanted ones. Since most analysis jobs filter out significant number of events, a considerable amount of time is wasted by reading the unwanted events. The Grid Collector removes this inefficiency by allowing users to specify more precisely what events are of interest and to read only the selected events. This speeds up most analysis jobs. In existing analysis frameworks, the responsibility of bringing files from tertiary storages or remote sites to local disks falls on the users. This forces most of analysis jobs to be performed at centralized computer facilities where commonly used files are kept on large shared file systems. The Grid Collector automates file management tasks and eliminates the labor-intensive manual file transfers. This makes it much easier to perform analyses that require data files on tertiary storages and remote sites. It also makes more computer resources available for analysis jobs since they are no longer bound to the centralized facilities

    LBNL-2164E FastBit: Interactively Searching Massive Data

    No full text
    As scientific instruments and computer simulations produce more and more data, the task of locating the essential information to gain insight becomes increasingly difficult. FastBit is an efficient software tool to address this challenge. In this article, we present a summary of the key underlying technologies, namely bitmap compression, encoding, and binning. Together these techniques enable FastBit to answer structured (SQL) queries orders of magnitude faster than popular database systems. To illustrate how FastBit is used in applications, we present three examples involving a high-energy physics experiment, a combustion simulation, and an accelerator simulation. In each case, FastBit significantly reduces the response time and enables interactive exploration on terabytes of data.
    corecore